Conversation
|
I wanted to start a discussion for some time to add such a trait, but I would really keep this feature internally for now until it is sufficiently baked. I would really like this feature to make |
davebayer
left a comment
There was a problem hiding this comment.
I am not a fan of this trait. We bend C++ rules to fix poorly designed nvfp types. We would have to basically change every use of is_trivially_copyable and is_trivially_copy_constructible to this trait to make it work consistently in libcu++.
I don't think this is a good idea, we should rather insist of the nvfp types being fixed.
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable_relaxed.rst
Outdated
Show resolved
Hide resolved
| Users may specialize ``cuda::is_trivially_copyable_relaxed`` for their own types whose memory representation is safe to copy | ||
| with ``memcpy`` but that the compiler does not consider trivially copyable. |
There was a problem hiding this comment.
not for the types that we care about. Said that, the user could provide an object that triggers UB. I can highlight this point in the documentation but we cannot do anything to explicitly prevent it.
Integrated and extended the tests that you point out. Everything works |
the actual issue is that the user can extend the trait to custom types. It needs to be public |
there are no plan for that. nvfp types are not trivially copyable for optimization purposes. |
This isn't going to happen and we shouldn't delude ourselves into thinking it ever will. |
This comment has been minimized.
This comment has been minimized.
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable_relaxed.rst
Outdated
Show resolved
Hide resolved
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable_relaxed.rst
Outdated
Show resolved
Hide resolved
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable_relaxed.rst
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/__type_traits/is_trivially_copyable_relaxed.h
Outdated
Show resolved
Hide resolved
…_relaxed.rst Co-authored-by: Giannis Gonidelis <gonidelis@hotmail.com>
…_relaxed.rst Co-authored-by: Giannis Gonidelis <gonidelis@hotmail.com>
This comment has been minimized.
This comment has been minimized.
cuda::is_trivially_copyable_relaxedcuda::is_trivially_copyable
This comment has been minimized.
This comment has been minimized.
libcudacxx/test/libcudacxx/cuda/type_traits/is_trivially_copyable.aggr.pass.cpp
Show resolved
Hide resolved
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable.rst
Outdated
Show resolved
Hide resolved
docs/libcudacxx/extended_api/type_traits/is_trivially_copyable.rst
Outdated
Show resolved
Hide resolved
| [[nodiscard]] _CCCL_API inline _CCCL_CONSTEXPR_BIT_CAST _To bit_cast(const _From& __from) noexcept | ||
| { | ||
| #if defined(_CCCL_BUILTIN_BIT_CAST) | ||
| if constexpr (::cuda::std::is_trivially_copyable_v<_To> && ::cuda::std::is_trivially_copyable_v<_From>) |
There was a problem hiding this comment.
here the trick. We know that __nv_bfloat2 and __half2 are not trivially copyable but, we also know, that these types are not available in constant expressions so we can dispatch depending on this property
😬 CI Workflow Results🟥 Finished in 4h 08m: Pass: 87%/99 | Total: 5d 00h | Max: 4h 07m | Hits: 37%/249239See results here. |
| @@ -0,0 +1,84 @@ | |||
| //===----------------------------------------------------------------------===// | |||
| // | |||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | |||
There was a problem hiding this comment.
Should be the libcu++ license
| _CCCL_BEGIN_NAMESPACE_CUDA | ||
|
|
||
| template <typename _Tp, typename = void> | ||
| constexpr bool __is_aggregate_trivially_copyable_v = false; |
There was a problem hiding this comment.
Critical: All those need to be inline constexpr
| #if defined(_CCCL_BUILTIN_BIT_CAST) | ||
| if constexpr (::cuda::std::is_trivially_copyable_v<_To> && ::cuda::std::is_trivially_copyable_v<_From>) | ||
| { | ||
| return _CCCL_BUILTIN_BIT_CAST(_To, __from); | ||
| } | ||
| else | ||
| { | ||
| return ::cuda::std::__bit_cast_memcpy<_To>(__from); | ||
| } | ||
| #else // ^^^ _CCCL_BUILTIN_BIT_CAST ^^^ / vvv !_CCCL_BUILTIN_BIT_CAST vvv | ||
| return ::cuda::std::__bit_cast_memcpy<_To>(__from); | ||
| #endif // !_CCCL_BUILTIN_BIT_CAST |
There was a problem hiding this comment.
We can shorten this
| #if defined(_CCCL_BUILTIN_BIT_CAST) | |
| if constexpr (::cuda::std::is_trivially_copyable_v<_To> && ::cuda::std::is_trivially_copyable_v<_From>) | |
| { | |
| return _CCCL_BUILTIN_BIT_CAST(_To, __from); | |
| } | |
| else | |
| { | |
| return ::cuda::std::__bit_cast_memcpy<_To>(__from); | |
| } | |
| #else // ^^^ _CCCL_BUILTIN_BIT_CAST ^^^ / vvv !_CCCL_BUILTIN_BIT_CAST vvv | |
| return ::cuda::std::__bit_cast_memcpy<_To>(__from); | |
| #endif // !_CCCL_BUILTIN_BIT_CAST | |
| #if defined(_CCCL_BUILTIN_BIT_CAST) | |
| if constexpr (::cuda::std::is_trivially_copyable_v<_To> && ::cuda::std::is_trivially_copyable_v<_From>) | |
| { | |
| return _CCCL_BUILTIN_BIT_CAST(_To, __from); | |
| } | |
| else | |
| #endif // _CCCL_BUILTIN_BIT_CAST | |
| { | |
| return ::cuda::std::__bit_cast_memcpy<_To>(__from); | |
| } |
| #if defined(_CCCL_BUILTIN_STRUCTURED_BINDING_SIZE) | ||
|
|
||
| // Returns the number of aggregate members, or `-1` if the type is not an aggregate. | ||
| template <typename _Tp, ::cuda::std::enable_if_t<_CCCL_BUILTIN_STRUCTURED_BINDING_SIZE(_Tp) >= 0, int> = 0> |
There was a problem hiding this comment.
Critical: This is not defined in libcu++ but only in cudax currently.
We need to move the builtin definition into libcu++
|
|
||
| // Returns the number of aggregate members, or `-1` if the type is not an aggregate. | ||
| template <typename _Tp, ::cuda::std::enable_if_t<_CCCL_BUILTIN_STRUCTURED_BINDING_SIZE(_Tp) >= 0, int> = 0> | ||
| constexpr int __aggregate_arity_v = _CCCL_BUILTIN_STRUCTURED_BINDING_SIZE(_Tp); |
There was a problem hiding this comment.
Critical: missing inline
| constexpr int __aggregate_arity_v = _CCCL_BUILTIN_STRUCTURED_BINDING_SIZE(_Tp); | |
| inline constexpr int __aggregate_arity_v = _CCCL_BUILTIN_STRUCTURED_BINDING_SIZE(_Tp); |
|
|
||
| // Returns the number of aggregate members, or `-1` if the type is not an aggregate. | ||
| template <typename _Tp> | ||
| constexpr int __aggregate_arity_v = int{sizeof(*__aggregate_arity_impl<_Tp>{}())} - 2; |
There was a problem hiding this comment.
Critical: missing inline
| constexpr int __aggregate_arity_v = int{sizeof(*__aggregate_arity_impl<_Tp>{}())} - 2; | |
| inline constexpr int __aggregate_arity_v = int{sizeof(*__aggregate_arity_impl<_Tp>{}())} - 2; |
There was a problem hiding this comment.
Please move this out of the PR.
This is a nontrivial piece of code and the current implementation seems really costly.
I want this in a separate PR so that we can properly review it
| static_assert(cuda::is_trivially_copyable<const _Tp>::value); | ||
| static_assert(cuda::is_trivially_copyable_v<_Tp>); | ||
| static_assert(cuda::is_trivially_copyable_v<const _Tp>); | ||
| } |
There was a problem hiding this comment.
Important: This should test that we can perform a bit_cast from this type to a similarly sized type
| static_assert(cuda::is_trivially_copyable<T>::value); | ||
| static_assert(cuda::is_trivially_copyable<const T>::value); | ||
| static_assert(cuda::is_trivially_copyable_v<T>); | ||
| static_assert(cuda::is_trivially_copyable_v<const T>); |
There was a problem hiding this comment.
Missing tests for bitcast back and forth
Description
Followup of the discussion in
warp_shuffleoriginal behavior (revert #8210) #8254The PR introduces
cuda::is_trivially_copyable_relaxedto support types that are actually trivially copyable but not recognized by the C++std::is_trivially_copyabletype trait.The new trait supports:
cuda::std::array.cuda::std::pair.cuda::std::tuple.Potentially affected paths by
std::is_trivially_copyableunsupported cases:
cuda/std/__atomic/types/base.h:39cuda/std/__atomic/types/reference.h:39cuda/__memcpy_async/memcpy_async_barrier.h:61cuda/__memcpy_async/memcpy_async_tx.h:58cuda/__container/buffer.h:111cuda/__algorithm/copy.h:73cuda/__algorithm/fill.h:42cuda/std/string_view:147fallback to slower path:
cuda/std/__algorithm/copy.h:102cuda/std/__algorithm/copy_backward.h:50cuda/std/__algorithm/move.h:54cuda/std/__algorithm/move_backward.h:53cuda/std/__bit/bit_cast.h:55cub/device/dispatch/kernels/kernel_histogram.cuh:81cub/detail/uninitialized_copy.cuh:33thrust/type_traits/is_trivially_relocatable.h:213cudax/include/cuda/experimental/__kernel/kernel_ref.cuh:54